== Binary Trees ==

UPDATE: Apparently the means by which these trees are stored in the file is
called canonical Huffman code.

Each chunk header can define up to 3 binary trees that are used to more
concisely encode numbers into the input data. Rather, the trees are stored in
the data for decoding those concise numbers. The first tree is used when
reading raw byte counts, the second for when reading distance/length lengths,
and the third for when reading distance/length distances. More on those later.

Note: For the uninitiated, you can mask out the lower 4 bits of a value by
performing a bitwise AND with the number 15.

The lower 4 bits of the first byte read for a binary tree indicates how many
values are to be added to the tree, and those values describe how to construct
that tree. These values are all 4 bits, and can be decoded with the following
algorithm:

  * Read a byte from the input into CURBYTE
  * Copy the lower 4 bits of CURBYTE into COUNT
  * If COUNT is 0
    + This binary tree will not be used
  * Otherwise
    + Add 2 to COUNT
    + Create an array NIBBLES that contains COUNT elements, with the first
      element being at index 0 in the list
    + LOOP: Iterate X from 0 to COUNT - 1
      - If X is even, divide CURBYTE by 16 (or shift CURBYTE right by 4)
      - If X is odd, read the next byte from the input into CURBYTE
      - Copy the lower 4 bits of CURBYTE into NIBBLES[X]

For example, the byte string 22 31 03 will yield 4 nibbles: 2, 1, 3, 3 and the
0 is just padding.

The final structure of the binary tree is not known until all of the nibbles
are processed. Note that any given node in a binary tree can either be a
terminal node with a value, or a branching node that has two child nodes. These
are the only two types of nodes that can exist in a binary tree, and the
topmost node that connects everything else is called the root node.

Initially, the root node is set to be a branching node and its child nodes are
of unknown type (in regards to terminal or branching). The root node is at the
top of the tree, meaning it has no "depth" into the tree. For this
documentation, I'll refer to this as having a "depth level" of 0. The two child
nodes of the root node have a depth level of 1. If they have child nodes, their
children will be at depth level 2 and so-on.

The initial tree looks like this:

  Level 0    root
            0/  \1
  Level 1   ?    ?

To determine the significance of these nodes, the list of nibbles is searched
in order to find one where the nibble value is equal to the depth level of the
node in question.

IMPORTANT: The order in which the nodes are considered is determined by the
preorder traversal algorithm, where "left" corresponds with the 0 side (not the
1 side).

Consider the list of nibbles from earlier:

  Nibble Value: 2 1 3 3
  Nibble Index: 0 1 2 3

In this case, the first node of unknown significance is the root node's 0 child
(according to preorder traversal), which is in depth level 1. Therefore, a
nibble with a value of 1 is sought. There is one, so the nibble's index becomes
the node's value--in this case also a 1--and the nibble is processed in such a
way so that it will not be considered in future searches: it is effectively
"crossed out."

At this point, the tree looks like this:

  Level 0    root
            0/  \1
  Level 1  (1)   ?

  Nibble Value: 2 x 3 3
  Nibble Index: 0 1 2 3

Note that the value of the nibble at index 1 has been crossed out.

When the list of nibbles is searched again for another nibble with a value of 1
(for the current depth level), none can be found because the only remaining
nibbles contain 2, 3 and 3. In this situation, the node becomes a branching
node and that node's 0 child is the next to be considered (according to the
preorder traversal algorithm):

  Level 0     root
             0/  \1
  Level 1   (1)   @
                0/ \1
  Level 2       ?   ?

  Nibble Value: 2 x 3 3
  Nibble Index: 0 1 2 3

The remainder of the nibbles and nodes are processed in the same manner--again,
according to preorder traversal--and the remaining steps look like this:

        =Step 3=                =Step 4=

  Level 0     root        Level 0    root
             0/  \1                 0/  \1
  Level 1   (1)   @       Level 1   (1)  @
                0/ \1                  0/ \1
  Level 2      (0)  ?     Level 2     (0)  @
                                         0/ \1
                          Level 3        ?   ?

  Nibble Value: x x 3 3   Nibble Value: x x 3 3
  Nibble Index: 0 1 2 3   Nibble Index: 0 1 2 3

        =Step 5=                 =Step 6=

  Level 0    root          Level 0     root
            0/  \1                    0/  \1
  Level 1   (1)  @         Level 1   (1)   @
               0/ \1                     0/ \1
  Level 2     (0)  @       Level 2      (0)  @
                 0/ \1                     0/ \1
  Level 3       (2)  ?     Level 3        (2) (3)

  Nibble Value: x x x 3    Nibble Value: x x x x
  Nibble Index: 0 1 2 3    Nibble Index: 0 1 2 3

Note: Nibbles need not define any values for a particular depth level. In fact,
in most situations, level 1 will not contain any definitions and thus both of
the child nodes of the root node themselves become branching nodes. It's
important to stick to the preorder traversal algorithm and attempt to use the
nibbles to assign values at each node, one step at a time.

Note: If a given nibble index doesn't need to appear in the binary tree as a
node value, the nibble value at that index can be 0. Since a nibble specifying
depth level 0 will never be searched for, this will effectively force that
index to be skipped. For the same reason, nibbles can be "crossed out" by
setting their values to 0.

Note: Data errors can easily make the tree impossible to construct. If, when
searching, all of the nibbles have a value less than the current depth level,
the process should be aborted before an endless loop and a memory leak occur.

Note: The process ordinarily terminates when there are no further nodes of
unknown significance. In the event there are still remaining, "unused" nibbles,
they are simply ignored.


== The Compression Algorithm ==

The compression algorithm itself is a type of ROLZ that, in addition to
encoding repeating strings of bytes in distance/length references (which is the
technique used by LZ77), is able to reduce the size of literal numbers in the
data by using a binary tree, presumably constructed with the Huffman technique.
This particular implementation, however, is quite restrictive and will lose out
over conventional compression algorithms for larger files. For the small data
in Lemmings Revolution, however, this variant of ROLZ will generally
out-perform DEFLATE, the algorithm used in ZIP files, for total compression
ratio.

The general decompression algorithm looks like this:

  * Initialize the decoder to the RAW state
  * LOOP: Begin decoding until all output data is processed or an error occurs
    + If the current decoder state is RAW
      - Read a number from the input
      - Read that number of bytes (8 bit values) from the input to the output
    + If the current decoder state is COPY
      - Read a number from the input for LENGTH and add 2
      - Read a number from the input for DISTANCE and add 1
      - From the current position in the output buffer, start reading from
        DISTANCE bytes earlier and copy LENGTH bytes to the end of the output
    + Alternate decoder states: if it's RAW, switch to COPY and vice-versa

Note: It's totally valid for LENGTH to be greater than DISTANCE. All that means
is that it will copy bytes that have already been copied during the current
operation. When repeating a single byte many times, DISTANCE can just be 1 and
LENGTH can be however many copies you want.


== Reading Bits ==

For the purpose of reading bits from the input data, the first bit is the
highest bit (2^7) of the first byte of the input data. The second bit is 2^6
and so-forth. The ninth bit will be the highest bit of the second byte, and
this continues through the input data. For example:

  * Input data (as hex): 9A 36 E3
  * Input data (as bits): 10011010 00110110 11101011
  * To read 3 bits will return 100
  * To read another 4 bits will return 1101
  * To read another 10 bits will return 0001101101


== Reading Numbers ==

To "read a number from the input," as the algorithm above indicates, follows a
particular convention. Numbers can be read in one of two ways, depending on
whether or not there is a binary tree present for the purpose of the number
being read (raw, length or distance). This is determined by the number of
nibbles read for the tree: if the number was 0, then the tree is not present.

If no tree is present for the current purpose, then numbers are read entirely
from the input and conform to a "normal" format.

If a tree is present for the current purpose, then the contents of that tree
will dictate what bits are read from the input and what the value of the number
will be.


== The Normal Number Format ==

The Normal format for numbers is as follows:

  * Initialize COUNT to 0
  * LOOP: Begin reading bits from the input until the bit read is a 0
    + If the bit is a 1, increment COUNT
  * Read an additional COUNT bits from the input into BITS
  * The value of the number is 2 ^ COUNT - 1 + BITS

Note: If COUNT reaches 16 (perhaps 17?), then the data contains an error and
the loop should be aborted. This isn't a violation of the format, per s, but
it will cause the 16-bit Lemmings Revolution implementation to overflow.


== The Tree Number Format ==

The Tree format for numbers is as follows:

  * Initialize NODE to the root node of the appropriate binary tree
  * LOOP: Read 1 bit at a time from the input until NODE is a terminal node
    + If the bit is a 0, set NODE to its 0 child node
    + If the bit is a 1, set NODE to its 1 child node
  * Copy the value of NODE into PARAM
  * If PARAM is less than 2
    + The value of the number is PARAM
  * Otherwise
    + Subtract 1 from PARAM
    + Read PARAM bits from the input into BITS
    + The value of the number is 2 ^ PARAM + BITS
